Goto

Collaborating Authors

 mdi importance


ADebiasedMDIFeatureImportanceMeasurefor RandomForests

Neural Information Processing Systems

In particular, interpreting Random Forests (RFs) [2] and its variants [14, 28, 27, 29, 1, 12] has become an important area of research due to the wide ranging applications of RFs invarious scientific areas, such asgenome-wide association studies (GWAS)[7],gene expression microarray[13,23],andgeneregulatorynetworks[9].



1cfa81af29c6f2d8cacb44921722e753-Paper.pdf

Neural Information Processing Systems

Then, wederivealocalMDI importance measure of variable relevance, which has a very natural connection withtheglobal MDImeasure andcanberelated toanewnotion oflocalfeature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature.



comments, we organize our responses as follows

Neural Information Processing Systems

We thank the reviewers for their valuable feedback that will significantly improve our paper. This is indeed a limitation of Theorem 1. The CHIP data included in our simulation studies shows that MDI-oob works in this setting. We plan to add this plot in our supplementary material. Reviewers 2 and 3: Give theoretical/empirical evidence that MDI-oob can "debias" MDI. Empirically, we compute the MDI-oob for the first simulation.


From global to local MDI variable importances for random forests and when they are Shapley values

Neural Information Processing Systems

Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature.


Understanding variable importances in forests of randomized trees

Neural Information Processing Systems

Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.


Reviews: A Debiased MDI Feature Importance Measure for Random Forests

Neural Information Processing Systems

I am updating my score from 7 to 8. --- # Originality The main contributions are all original. While the take-home message of the study is in retrospect simple and obvious ( compute MDI importances on out-of-bag samples), the paper provides an original analysis that explains and justifies this modification of the computation of MDI importances. Some remarks however: - I would have appreciated a controlled experiment where G0(T) can be computed exactly in order to empirically appreciate the (supposed) tightness of the bound. More specifically, what if A1 and A2 are not satisfied? In real-word setups, A1 is very unlikely to hold.


e3796ae838835da0b6f6ea37bcf8bcb7-Reviews.html

Neural Information Processing Systems

Variable importance measures are often used to highlight key predictor variables in tree-based ensemble models. However, there has generally been a lack of a theoretical understanding of these measures. In this paper, the authors study the theoretical properties of the Mean Decrease Impurity (MDI) importance measures (such as the Gini importance). Through most of the paper they use the Shanon entropy as the impurity measure but show that the theorems and results are applicable to any impurity measure. They begin with an asymptotic analysis of totally randomized fully-developed tree ensembles learned using an infinitely large ensemble.


From global to local MDI variable importances for random forests and when they are Shapley values

Sutera, Antonio, Louppe, Gilles, Huynh-Thu, Van Anh, Wehenkel, Louis, Geurts, Pierre

arXiv.org Machine Learning

Random forests have been widely used for their ability to provide so-called importance measures, which give insight at a global (per dataset) level on the relevance of input variables to predict a certain output. On the other hand, methods based on Shapley values have been introduced to refine the analysis of feature relevance in tree-based models to a local (per instance) level. In this context, we first show that the global Mean Decrease of Impurity (MDI) variable importance scores correspond to Shapley values under some conditions. Then, we derive a local MDI importance measure of variable relevance, which has a very natural connection with the global MDI measure and can be related to a new notion of local feature relevance. We further link local MDI importances with Shapley values and discuss them in the light of related measures from the literature. The measures are illustrated through experiments on several classification and regression problems.